Improving fuzzy matching through syntactic knowledge
نویسندگان
چکیده
Fuzzy matching in translation memories (TM) is mostly string-based in current CAT tools. These tools look for TM sentences highly similar to an input sentence, using edit distance to detect the differences between sentences. Current CAT tools use limited or no linguistic knowledge in this procedure. In the recently started SCATE project, which aims at improving translators’ efficiency, we apply syntactic fuzzy matching in order to detect abstract similarities and to increase the number of fuzzy matches. We parse TM sentences in order to create hierarchical structures identifying constituents and/or dependencies. We calculate TER (Translation Error Rate) between an existing human translation of an input sentence and the translation of its fuzzy match in TM. This allows us to assess the usefulness of syntactic matching with respect to string-based matching. First results hint at the potential of syntactic matching to lower TER rates for sentences with a low match score in a string-based setting.
منابع مشابه
Semantics-based pretranslation for SMT using fuzzy matches
Semantic knowledge has been adopted recently for SMT preprocessing, decoding and evaluation, in order to be able to compare sentences based on their meaning rather than on mere lexical and syntactic similarity. Little attention has been paid to semantic knowledge in the context of integrating fuzzy matches from a translation memory with SMT. We present work in progress which focuses on semantic...
متن کاملCase-based Reasoning for Diagnosis of Stress using Enhanced Cosine and Fuzzy Similarity
Intelligent analysis of heterogeneous data and information sources for efficient decision support presents an interesting yet challenging task in clinical environments. This is particularly the case in stress medicine where digital patient records are becoming popular which contain not only lengthy time series measurements but also unstructured textual documents expressed in form of natural lan...
متن کاملAugmenting String-to-Tree Translation Models with Fuzzy Use of Source-side Syntax
Due to its explicit modeling of the grammaticality of the output via target-side syntax, the string-to-tree model has been shown to be one of the most successful syntax-based translation models. However, a major limitation of this model is that it does not utilize any useful syntactic information on the source side. In this paper, we analyze the difficulties of incorporating source syntax in a ...
متن کاملSimple but Effective Approaches to Improving Tree-to-tree Model
Tree-to-tree translation model is widely studied in statistical machine translation (SMT) and is believed to be much potential to achieve promising translation quality. However, the existing models still suffer from the unsatisfactory performance due to the limitations both in rule extraction and decoding procedure. According to our analysis and experiments, we have found that tree-to-tree mode...
متن کاملTransformation Rules for Knowledge-Based Pattern Matching
Many AI tasks require determining whether two knowledge representations encode the same knowledge. For example, rule-based classification requires matching rule antecedents with working memory; information retrieval requires matching queries with documents; and some knowledge-acquisition tasks require matching new information with already encoded knowledge to expand upon and debug both of them....
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2015